{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supervised Imitation Learning \n",
    "\n",
    "\n",
    "In the imitation learning setting, our goal is to mimic the expert. Say, we want to train our agent to drive a car, instead of training the agent from scratch by interacting with the environment we can train them with the expert demonstrations. Okay, what are expert demonstrations? Expert demonstrations are a set of trajectories consists of state-action pairs where each action is performed by the expert.\n",
    "\n",
    "We can train the agent to mimic the actions performed by the expert in the respective states. Thus, we can view expert demonstrations as training data to train our agent. The fundamental idea of imitation learning is to imitate (learn) the behavior of an expert.\n",
    "\n",
    "One of the simplest and naive ways to perform imitation learning is by treating the imitation learning task as a supervised learning task. First, we collect a set of expert demonstrations and then we will train a classifier to perform the same action performed by the expert in a particular state. We can view this as a big multiclass classification problem and train our agent to perform the action performed by the expert in the respective state.\n",
    "\n",
    "\n",
    "Our goal is to minimize the loss $L(a^*, \\pi_{\\theta}(s)) $ where $a^*$ is the expert action and $\\pi_{\\theta}(s) $ denotes the action performed by our agent.\n",
    "\n",
    "Thus, in supervised imitation learning, we perform the following steps: \n",
    "\n",
    "* Collect the set of expert demonstrations\n",
    "* Initialize a policy $\\pi_{\\theta}(s) $\n",
    "* Learn the policy by minimizing the loss function $L(a^*, \\pi_{\\theta}(s)) $\n",
    "\n",
    "However, there exist several challenges and drawbacks with this method. The knowledge of the agent is limited only to the expert demonstrations (training data) so if the agent comes across a new state which is not present in the expert demonstrations then the agent will not know what action to perform in that state. \n",
    "\n",
    "Say, we train the agent to drive a car using supervised imitation learning and let the agent perform in the real world. If the training data has no state where the agent encounters a traffic signal then our agent will have no clue about the traffic signal. Also, the accuracy of the agent is highly dependent on the knowledge of the expert. If the expert demonstrations are poor or not optimal then the agent cannot learn correct actions or optimal policy. \n",
    "\n",
    "To overcome the challenges in supervised imitation learning, we introduce a new algorithm called DAgger. In the next section, we will learn how DAgger works and how it overcomes the limitations of the supervised imitation learning. \n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
